title

Created by:

  1. The Begging
    1. Importan imports
    2. Orginal data
    3. Selecting correlated features
  2. Precise analysis of important attributes
    1. Parallel Coordinates Plot
    2. Heat map
    3. The influence of markers on the outcome
    4. How markers are correlated
  3. Other features
    1. Facet Scatter plot
    2. Normal Scatter plot
    3. Violin plot
    4. Checking hypothesis

The Begging


Importan imports


Orginal data

We have to start from read the data, and transform it into something useful. We decide that we just group by id of patients, and use mean value as our new 'main' value


Selecting correlated features

At the begging of our adventure we have to decide which feature are at most interesting. We select most correlated features with outcome. Selected features have correlation higher than 0.5 with outcome.

From the "Most correlated functions" bar chart, we can easily read the most important blood features influencing the patient's death.

The biggest correlation have % amount of lymphocytes and neutrophils. At the bottom of list are also % amount of monocytes and eosinophils. Admittedly, on the chart is only neutrophils count, which might suggesting that this is a white cell type which is the most important, and all correlation of other white cells is only random.

Worth noting is that on list appear age, which suggesting that some part of society is more vulnerable.

Precise analysis of important attributes

Parallel Coordinates Plot

Parallel coordinates plot, illustrating dependency between blood features which have correlation with outcome higher than 0.7.

Thanks to the graph of "parallel coordinates", we can easily separate patients with a certain range of attribute values. By selecting a given range on the axes, only those patients which are within the range are highlighted. To reset the range, double-click the selected axis.


Heat map

By analyzing the markers for COVID detection, we decided to examine the correlation between them and the result, and then check how age affects the values blood features.

The influence of markers on the outcome

From previous heatmap of correlation we decide to compare influence of HSC on outcome with age and gender.

In a healthy person the concentration is not high, it does not exceed 5 mg / l, but in COVID patients "HSC" increases strongly. Selecting the range from 0-50 mg / L to all patients' mortality rate is only ~ 0.15. However, if we choose a larger HSC range, the mortality rate increases significantly. By clicking on the red / green bar, we can easily distinguish recovered or dead patients, thanks to which we observe that with age and the concentration of "HSC", the mortality drastically increases.


How markers are correlated

Other features

For the next plot we want to see how amount of Thrombocytocrits and Serum sodium influences at probability of not surviving Covid-19 at different stages of life, and how it is different between the sexes. To do this, we have to add additional column with different stage of life.

Facet Scatter plot

At the chart above we can observe few things.

First of all their is not so much people before 40 (young and adult category). It may means that people at this stages of life have lower probability of severe disease and being in need of hospitalization or that in early days of pandemia(samples come between 2020-01-10 and 2020-02-18) there was bigger need o taken care of older people.

Secondly, seniors, which have lower ratio of thrombocyte volume to plasma(thrombocytocrit) are in group of people of high risk

Thirdly, all people whom serum sodium is much above norm(145mmol/l) have died. It is only about 30 people

Normal Scatter plot

On this plot we can observe that very high amount of blood samples of people, which albumin and calcium level was both below normal expected values(3.5 g/dl and 2.1 mmol/l respectively), which are represented by horizontal and vertical line, are in group of people of high risk.

Violin plot

At the end, we want to see how density of eosinophils in population is important for morality rate, and see how it is depended between female and male, in different ages.

Checking Hypothesis

At the end of Selecting correlated features we stated a hypothesis, that from all white cells only neutrophils amount count.

It is clearly visible that lymphocyte and monocytes have no impact on outcome. Values of eosinophil and basophil are too small and not so different to draw any conclusion. Only in the case of neutrophils the amount is satisfying, and is sufficiently distinguishable to say, that amount of it important for health.